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Bacillus amyloliquefaciens HB-26, a Gram-positive bacterium was isolated from soil in China. 
SDS-PAGE analysis showed this strain secreted six major protein bands of 65, 60, 55, 34, 25 
and 20 kDa. A bioassay of this strain reveals that it shows specific activity against P. brassicae 
and nematode. Here we describe the features of this organism, together with the draft ge- 
nome sequence and annotation. The 3,989,358 bp long genome (39 contigs) contains 4,001 
protein-coding genes and 80 RNA genes. 



Introduction 

Bacillus amyloliquefaciens is a species of bacte- 
rium in the genus Bacillus with high affinity of Ba- 
cillus subtilis. In the growth process, B. 
amyloliquefaciens can produce numerous antimi- 
crobial or, more generally, bioactive metabolites 
with well-established activity in vitro such as 
surfactin, iturin and fengycin [1,2]. The production 
of all of these antibiotic compounds highlights B. 
amyloliquefaciens as a good candidate for the de- 
velopment of biocontrol agents [3,4]. 

Strain HB-26 belongs to the species B. 
amyloliquefaciens. The type strain of the species 
produces much bioactive metabolites showing 
specific activity against Plasmodiophora brassicae 
which could cause Clubroot, one of the most seri- 
ous diseases of brassica crops worldwide [5-7]. 
Heavy infection by this pathogen of Chinese cab- 
bage, cabbage, broccoli, turnip, oilseed rape, and 
other crucifers can lead to severe economic losses 
[8-11]. The root systems of infected plants show 
gall formation, which inhibits nutrient and water 
transport, stunts plant growth, and increases sus- 
ceptibility to wilting [12,13]. Otherwise, bioassay 
results showed strain HB-26 also had some root- 
knot nematicidal activity. 




Here, we present a summary classification and a 
set of features for B. amyloliquefaciens HB-26, to- 
gether with the description of the genomic se- 
quencing and annotation in order to improve the 
understanding of the molecular basis for its ability 
to inhibit Plasmodiophora brassicae and nema- 
tode. 

Classification and features 

Strain HB-26 colonies were milky white and matte 
with a wrinkled surface. Microscopy observations 
indicated that it was a Bacillus species (Figure lA, 
Figure IB and Table 1). SDS-PAGE analysis 
showed this strain secreted six major protein 
bands of 65, 60, 55, 34, 25 and 20 kDa (Figure IC). 

A representative genomic 16S rDNA sequence of 
strain HB-26 was searched against GenBank data- 
base using BLAST [29]. Sequences showing more 
than 99% sequence identity to 16S rDNA of HB-26 
were selected for phylogentic analysis, and 15 se- 
quences were aligned with ClustalW algorithm. 
The tree was reconstructed by neighbor-Joining 
by using Kimura 2-parameter for distance calcula- 
tion. The phylogenetic tree was assessed by boot- 
strapped for 1,000 times, and the consensus tree 
was shown in Figure 2. 
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Figure 1. General characteristics of B. amyloliquefaciens 
HB-26. (A) The colonial morphology pictures of strain HB- 
26. (B) Phase contrast micrograph of HB-26. (C) SDS- 
PAGE analysis of proteins of HB-26. Lane M, protein mo- 
lecular weight marker; Lane 1, proteins of strain HB-26. 



Table 1. Classification and general features of B. amyloliquefaciens HB-26 



MIGS ID 


Property 


Term 


Evidence code" 






Domain Bacteria 


TAS [14] 






Phylum Firmicutes 


TAS [15-17] 






Class Bacilli 


TAS [18,19] 




Current classification 


Order Bacillales 


TAS [20,21] 






Family Bacillaceae 


TAS [20,22] 






Genus Bacillus 


TAS [20,23,24] 






Species Bacillus amyloliquefaciens 


TAS [25-27] 




Gram stain 


Gram-positive 


NAS 




Cell shape 


rod-shaped 


IDA 




Motility 


motile 


NAS 




Sporulation 


spore-forming 


IDA 




Temperature range 


room temperature 


NAS 




Optimum temperature 


pH7.0 


IDS 




Carbon source 


organic carbon source 


NAS 




Energy source 


organic carbon source 


NAS 


MIGS-6 


Habitat 


soil 


IDA 


MIGS-6.3 


Salinity 


salt tolerant 


NAS 


MIGS-22 


Oxygen 


aerobic 


NAS 


MIGS-14 


Pathogenicity 


avirulent 


NAS 


MIGS-4 


Geographic location 


Hubei, China 


IDA 


MIGS-4.1 


Latitude 


30.07N 




MIGS-4.2 


Longitude 


n2.23E 




MIGS-4. 3 


Depth 


5-1 0cm 




MIGS-4.4 


Altitude 


about 35m 




MIGS-5 


Sample collection time 


2009 


IDA 



"Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report 
exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, 
isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). 
These evidence codes are from the Gene Ontology project [28] 
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KC692179 1 Bacillus amyloliquefaciens strain ML581 
KC752450 1 Bacillus amyloliquefaciens strain JM-21 
HM 133476 HB-26 



■ JF496475 1 Bacillus vallismortis strain WA3-7 

■ HF549161 1 Bacillus sp. BYK1448 

■ KF1 12078 1 Bacillus subtilis strain 2B 

■ JN999861 1 Bacillus methylotrophicus strain GZGL8 

■ KC441761.1 Bacillus vallismortis strain D20 
■JN700126 1 Bacillus tequilensis strain L10 

■ KC310834 1 Bacillus sp. C4(2013) 

■ KC460988 1 Bacillus subtilis strain WBZ 

■ KF040973 1 Bacillus amyloliquefaciens strain CA81 
■JX861886 1 Bacillus sp. SWB30 

■ HQ831412 1 Bacillus methylotrophicus strain Ns7-22 

■ KC295415.1 Bacillus subtilis strain 26A 



Figure 2. Neighbor-Joining Phylogenetic tree was generated using MEGA 4 based on 16S rRNA sequences. The 
strains and their corresponding GenBank accession numbers for 16S rDNA sequences are; A: B. amyloliquefaciens 
ML581 (KC692179.1); B: B. amyloliquefaciens JM-21 (KC752450.1 ); C: Bacillus strain HB-26 (HM1 38476); D: B. 
vallismortis WA3-7 (JF496475.1 ); E: B. sp.BYK1448 (HF5491 61 .1 ); F: B. subtilis 2B (KF1 12078.1); G: 
B. methylotrophicus GZGL8 (JN999861 .1 ); H: B.vallismords D20 (KC441 761 .1 ); I: B.tequilensis L10 (JN700126.1 ); J: 
B. sp. C4(2013) (KC31 0834.1); K: B. subtilis WBZ (KC460988.1 ); L: B. Amyloliquefaciens CA81 (KF040978.1) ; M: B. 
sp. SWB30 0X861886.1) ; N: B. methylotrophicus Ns7-22 (HQ831412.1); O: B. subtilis 26A (KC29541 5.1 ). The phy- 
logenetic tree was constructed by using the neighbor-joining method within the MEGA software [30]. 



Genome sequencing information 

Genome project history 

This Bacillus strain was selected for sequencing 
due to its specific activity against Plasmodiophora 
brassicae and nematode. The complete high quali- 



ty draft genome sequence is deposited in 
GenBank. The Beijing Genomics Institute (BGI) 
performed the sequencing and the NCBI staffs 
used the Prokaryotic Genome Annotation Pipeline 
(PGAAP) to complete the annotation. A summary 
of the project is given in Table 2. 



Table 2. Project information 



MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Draft 


MIGD-28 


Libraries used 


One genomic libraries, one lllumina paired-end library (700 bp 
inserted size) 


MIGS-29 


Sequencing platform 


lllumina Hiseq 2000 


MIGS-31. 2 


Sequencing coverage 


192 X 


MIGS-30 


Assemblers 


SOAPdenovo 1 .05 version 


MIGS-32 


Gene calling method 


Glimmer and GeneMark 




GenBank Data of Release 


August 31, 2016 




NCBI project ID 


AUWKOOOOOOOO 




Project relevance 


Agricultural 
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Growth conditions and DNA isolation 

B. amyloliquefaciens HB-26 was grown in 50 mL 
Luria-Broth for 6 h at 28°C. DNA was isolated by 
incubating the cells with lysozyme (10 mg/mL) in 
2 mL TE (50 mM Tris base, 10 mM EDTA, 20% su- 
crose, pHS.O) at 4°C for 6 h. 4 mL of 2% SDS were 
added and the mixture was incubated at 55°C for 
30 min; 2 mL 5M NaCl were added, and the mix- 
ture was incubated at 4°C for 10 min. DNA was 
purified by organic extraction and ethanol precipi- 
tation. 

Genome sequencing and assembly 

The genome of B. amyloliquefaciens HB-26 was se- 
quenced using lllumina Hiseq 2000 platform (with 
a combination of a 251-bp paired-end reads se- 
quencing from a 700-bp genomic library). Reads 
with average quality scores below Q30 or more 
than 3 unidentified nucleotides were eliminated. 
2,605,589 paired-end reads (achieving ~192 fold 
coverage [0.94 Gb]) was de novo assembled using 
SOAPdenovo 1.05 version [9]. The assembly con- 
sists of 39 contigs arranged in 39 scaffolds with a 



total size of 3,989,358 bp (including chromosome 
and plasmids). 

Genome annotation 

Genome annotation was completed using the Pro- 
karyotic Genomes Automatic Annotation Pipeline 
(PGAAP). Briefly, Protein-coding genes were pre- 
dicted using a combination of GeneMark and 
Ghmmer [31-33]. Ribosomal RNAs were predicted 
by sequence similarity searching using BLAST 
against an RNA sequence database and/or using 
Infernal and Rfam models [34,35]. Transfer RNAs 
were predicted using tRNAscan-SE [36]. In order 
to detect missing genes, a complete six-frame 
translation of the nucleotide sequence was done 
and predicted proteins (generated above] were 
masked. All predictions were then searched using 
BLAST against all proteins from complete micro- 
bial genomes. Annotation was based on compari- 
son to protein clusters and on the BLAST results. 
Conserved domain Database and Cluster of 
Orthologous Group information is then added to 
the annotation. 



Table 3. Nucleotide content and gene count levels of the genome 


Attribute 


Value 


% of total" 


Genome size (bp) 


3,989,358 


100.00 


DNA coding region (bp) 


3,486,615 


87.39 


DNA C+C content (bp) 


1,889,758 


47.37 


Number of scaffolds 


39 




Extrachromosomal elements 


unknown 




Total genes 


4,114 


100.00 


tRNA genes 


76 


1.85 


rRNA genes 


4 


0.1 


rRNA operons 






Protein-coding genes 


4,001 


97.25 


Pseudo gene (Partial genes) 


0 (36) 


0 (0.87%) 


Genes with function prediction (pro- 






teins) 


2224 


54.06% 


Genes assigned to COGs 


2,336 


56.78% 


Genes with signal peptides 


328 


7.97 


CRISPR repeats 


0 


0 



"The total Is based on either the size of the genome In base pairs or the total number 
of protein coding genes in the annotated genome. 

''None of the rRNA operons appears to be complete due to unresolved assembly 
problems. 
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^ . so identified. The majority of the protein-coding 

Cienome properties genes (54.06%) were assigned a putative function 

The draft assembly of the genome consists of 39 ^hile the remaining ones were annotated as hypo- 

contigs in 39 scaffolds, with an overall 47.37% thetical proteins. The distribution of genes into 

G+C content. Of the 4,114 genes predicted, 4,001 cOGs functional categories is presented in Table 3, 

were protein-coding genes, and 80 RNAs were al- Table 4 and Figure 3. 

Table 4. Number of genes associated with the 25 general COG functional categories 



Code 


Value 


%age' 


Description 


J 


130 


3.160 


Translation, ribosomal structure and biogenesis 


A 


0 


0.0 


RNA processing and modification 


K 


262 


6.368 


Transcription 


L 


122 


2.965 


Replication, recombination and repair 


B 


1 


0.024 


Chromatin structure and dynamics 


D 


34 


0.826 


Cell cycle control, cell division, chromosome partitioning 


Y 


0 


0 


Nuclear structure 


V 


52 


1.264 


Defense mechanisms 


T 


153 


3.719 


Signal transduction mechanisms 


M 


182 


4.424 


Cell wall/membrane/envelope biogenesis 


N 


53 


1.288 


Cell motility 


Z 


0 


0.000 


Cytoskeleton 


W 


1 


0.024 


Extracellular structures 


u 


43 


1.045 


Intracellular trafficking, secretion, and vesicular transport 


o 


97 


2.358 


Posttranslational modification, protein turnover, chaperones 


c 


177 


4.302 


Energy production and conversion 


c 


249 


6.053 


Carbohydrate transport and metabolism 


E 


340 


8.264 


Amino acid transport and metabolism 


F 


79 


1.920 


Nucleotide transport and metabolism 


H 


123 


2.990 


Coenzyme transport and metabolism 


1 


117 


2.844 


Lipid transport and metabolism 


P 


205 


4.983 


Inorganic ion transport and metabolism 


Q 


116 


2.820 


Secondary metabolites biosynthesis, transport and catabolism 


R 


435 


10.574 


General function prediction only 


S 


287 


6.976 


Function unknown 




856 


20.81 


Not in COGs 



"The total is based on the total number of protein coding genes in the annotated genome. 
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Figure 3. Graphical circular map of the Bacillus amyloliquefaciens HB-26 genome. From the outside to the 
center: genes on forward strand (color by COG categories), genes on reverse strand (color by COG catego- 
ries), GC content, GC skew. The map was generated with the CGviewer server (Stothard Rearch Group: 
http://stothard.afns.ualberta.ca/cgview_server/). 
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